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1.  The  attached  :wport  is  for  your  Inforaatlon  and  retention. 

3.  Ihe  objective  of  this  study  was  to  develop  a  aore  reliable 
ayaten  of  evaluating  helicopter  pilots'  flight  perfornance  by  putting 
eaphasls  on  standardized  and  objective  aeaaures,  ehldi  would  also 
provide  a  diagnostic  record  of  student  perfOrsance. 

3.  This  report  Is  considered  to  be  of  prlnary  interest  to  those 
organizations  and  agencies  concerned  with  h^oopter  pilot  training. 

Ihe  Pilot  Performance  Pescrlptlon  Records  (PtVR)  have  proved  useful  in 
administering  check  rides  In  primary  helicopter  training.  The  system 
provides  a  means  of  diagnosing  specific  sources  of  a  student's  end>of- 
phase  deficiencies,  by  the  detailed  recording  of  his  flight  performance. 

4.  The  report  serves  to  standardise  pilot  proficiency  evaluation 
through  reducing  subjective  differences  In  scoring  procedures. 
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The  Human  Aetourees  Reaetreh  Office  ia  a  Boagovenmenta!  agency  of  The  George  Waahingtoo 
University,  operating  onder  contract  with  the  Departxent  of  the  Amy  (OA  44-188-ARO>?).  HumRRO’s 
mission,  stated  by  AR  7*>.8,  is  to  conduct  studies  and  research  in  the  Reids  of  trcining,  motivation, 
leadership,  and  man-weapons  system  analysis. 

Research  is  reported  by  HoniRRO  in  publications  of  several  types. 

1.  Technical  Reports  are  prepared  at  the  completion  of  a  research  Task  or  major  portion  thereof. 
They  are  designed  speciiicalty  for  a  military  audience  and  convey  recommendations  for  Army  action. 

2.  Research  Reports  may  be  prepared  at  any  time  during  a  Task.  They  are  designed  primarily 
tor  a  research  audience  but  may  be  of  interest  to  a  military  audience.  They  report  research  Rndinga 
of  interest  and  value  to  the  scientific  community  and  do  not  recommend  .Amy  action. 

3.  Research  Memoranda  may  be  prepared  at  any  time  and  need  not  be  directly  associated  with 
a  particular  research  Task.  They  report  Rndings  that  may  be  of  interest  to  a  research  or  military 
audience  or  to  both.  They  do  not  recommend  Amy  action. 

4.  Consulting  Reports  are  prepared  following  completion  of  a  speciRcally  requested  consulting 
action  under  HumRRO's  Technical  Advisory  Services.  They  are  designed  for  a  speciRc  militcry 
audience  and  usually  convey  recommendations  for  Amy  action. 

5.  Research  Bulletins  are  prepared  as  nontechnical  summeries  of  one  or  more  research  Tasks 
or  as  reports  of  other  HumRRO  activities.  They  are  intended  primarily  for  a  military  audience  and  do 
not  present  recommendations  for  Amy  action.  Their  distribution  usually  includes  agencies  and 
individuals  conducting  reueerch,  and  the  general  public. 

Technical  Reports  and  Research  Bulletins  may  be  requested  from  the  Director’s  Office,  which 
also  issues  a  complete  bihliogruphy.  Other  pnblicBtioua  may  be  obtained  from  the  Director  of  Researc’; 
of  the  originating  Unit  or  Division. 


PROBLEM 

improvement  In  the  efficiency  of  the  Army's  primary  helicopter  trainin?  program 
deperxds  to  a  la.ge  degree  on  the  reliability  of  flight  training  evaluation.  The  traditional 
flight  check  has  consisted  of  an  evaluation  of  the  flight  by  the  check  pilot  not  on  the 
basis  of  a  uniform  series  of  maneuvers  and  measures,  but  on  the  basis  of  his  personal 
specifications.  It  seemed  probable  that  tiie  unreliability  of  the  traditional  me'uc-'  of 
evaluation,  v^hich  had  been  repeatedly  demonstrated,  was  due  primarily  to  this  i.._x  of 
standardization.  This  study  wos  initiated  to  de'.’slop  a  more  reliable  system  of  evaluat¬ 
ing  helicopter  pilots'  flight  perfamance,  by  emphasizing  standardized  and  objective 
measures  wiiich  also  prc  ide  o  diagnostic  record  d.  student  perfcrmonce. 

PROCEDURE 

Training  grades  and  check  flight  grodes  were  analyzed  for  Army  helicopter  pilots 
ot  both  the  U.S.  Army  Aviation  School  (USAAVNS),  Fort  Rucker,  Ala.,  in  1956-57  ond  at 
the  U.S.  Army  Primary  Helicopter  School  (USAPHS).  Camp  Welters,  Tex.,  in  1957.  In 
general,  the  relationships  between  the  trolning  grades  and  the  corresponding  test  grades 
proved  to  be  little  better  than  zero.  In  cmother  analysis,  it  was  found  that  ratings  of  stu¬ 
dents'  flight  performance  reflected  the  standards  of  evaluotion  applied  by  individual  check 
pilots  more  than  they  did  the  students'  flying  skill. 

The  fust  step  in  the  development  of  a  mete  effective  method  of  flight  evaluation 
was  an  analysis  of  the  light  helicopter  training  program  content  Into  fundamental  training 
maneuvers  and  maneuver  components.  Simple  scales  of  several  types  were  developed  for 
use  by  the  check  pilot  in  tecerdlng  the  students'  performance  on  each  of  these  components. 
Where  it  was  possible,  duc-ct  Instrument  observation.'!  vere  recorded.  However,  many 
evaluations  are  necessarily  based  on  individual  judgment,  to  a  lesser  or  greater  degiee; 
where  Judgments  were  required,  the  performance  being  evduated  was  defined  as  specifi¬ 
cally  as  possible  at  each  point  on  the  scale  in  etder  to  narrow  the  range  of  personal 
Interpretation  in  assigning  ratings. 

The  next  step  was  the  development  of  a  format  for  an  Intermediate  and  an  Advanced 
Pilot  Performance  Description  Record  (PPDR).  Each  PPDR  was  based  on  a  standard 
ride,  that  is,  the  same  maneuvers  flown  in  the  same  sequet'ce.  The  scales  Included  as 
PPDR  items  were  those  judged  to  be  most  critical  to  successful  performance  in  each 
maneuver.  The  number  of  scales  that  an  expert  check  pilot  could  safely  observe  and 
record  during  a  check  tide  was  used  as  the  basis  for  setting  the  total  number  of  PPDR 
items  (most  items  were  recorded  as  the  operation  was  being  accomplished,  but  on  opera¬ 
tions  that  are  considered  hazardous,  recording  was  delayed  until  completion  of  the 
dangerous  portion). 

The  PPDB's  were  then  tested  by  administering  check  rides  to  40  Intermediate  and 
35  Advanced  students  at  the  Primary  Helicopter  School  (Comp  Wolters)  iii  1957.  Each 
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student  was  administered  one  ride  by  a  LIFT  research  staff  pilot  and  one  ride  by  a  mili¬ 
tary  check  pile*  ossiijned  to  USAPHS. 

The  PPDR's  were  revised  on  the  basis  of  experience  in  the  first  administration, 
and  the  revised  PPDR's  were  evaluated  in  1958.  Chuck  pilots  were  ylven  one  week  of 
training  in  the  use  of  the  PPDR  system,  with  emphasis  placed  an  identification  and  reduction 
of  check  pilot  differences  in  scoring  standards.  Two  successive  rides,  each  with  a  dif¬ 
ferent  I'SAPHS  check  pilot,  were  given  to  50  Intermediate  and  50  Advanced  students. 

Sev  ral  approaches  to  summarizing  the  dato  on  student  perfcrmance  which  the  PPDR 
check  rides  provided  were  explored.  One  was  simply  to  total  the  number  of  errors  recorded 
on  the  PPDR  in  a  check  flight.  A  second  weighted  items  according  to  difficulty.  In 
another  approach  ("e,:or  pattern-weighted')  the  pilot  rated  the  student's  over-all  perform¬ 
ance  on  g  maneuver  segment,  taking  into  consideration  not  only  errors  but  their  sequence 
and  combination;  these  segment  ratii.gs  were  weighted  according  to  difficulty  and  impor¬ 
tance  of  the  maneuver.  Finally,  the  check  pilot  assigned  an  over-all  Judgmental  rating, 
based  upon  a  review  of  the  detailed  PPDR  record  of  the  student's  performance,  and  com¬ 
parable  to  the  'traditional'  score. 

FINDINGS 

1.  Improved  reliability  of  flight  proficiency  evaluation  resulted  from  the  "ise  of  the 
PPDR  system. 

2.  The  PPDR  system  provided  a  means  of  diagnosing  specific  sources  of  a  student's 
end-of-phase  deficiencies,  by  reewding,  in  detoil,  the  student's  performance  on  his  flight 
check  rides. 

3.  Check  pilots  who  w^re  completely  fandliar  with  the  PPDR  were  reliably  more 
similar  in  their  evaluation  of  proficiency  than  were  check  pilots  who  were  only  oriented 
to  the  PPDR. 


CONCLUSIONS 

1.  Tlie  PPLR  flight  evaluation  system  can  provide  an  evaluation  of  helicopter  stu¬ 
dents'  flight  perfornxjncc  th.Tt  is  at  an  acceptable  level  of  reliability.  The  resulting  diag¬ 
nostic  data  provid'!  the  basis  for  determining  flight  deficiencies  of  individual  students 
and  for  malntalnlnc/  uniform  standards  for  both  instruction  and  evaluation. 

2.  To  maximize  the  effectiveness  of  the  PPDR  system,  it  is  necessary  that  per¬ 
sonnel  ser.'lng  as  check  pilots  be  trained  In  the  concepts,  objectives,  and  techniques  of 
the  system,  and  i'l  administering  and  scoring  the  PPDR's. 


RECOMMENDATION 


It  l8  recommended  that  t.ie  PPDR  system  be  adopted  in  Frimory  helicopter  training 
and  further  developed.*  Special  emphasis  should  be  given  to  (a)  training  check  pilots 
thoroughly  in  the  PPDR  system,  especially  on  scoring  PPDR's,  and  (b)  developing  a 
system  for  processing  the  diagnostic  data  both  for  debriefing  students  and  for  maintaining 
standards  of  instruction  and  check  pilots'  evaluation. 


*A  quoU<r  control  pfoqrom  bc»od  upon  a  rovisod  rorvion  of  th«  Pilot  Performanc*  Doocription 
Rocord  systom  ho«  boon  devlood  ond  is  boing  LTplomonted  at  tho  U.S.  Army  Primary  Holicoptor 
School  at  Como  Wolttrs.  Esporionco  has  boon  ob^alnod  in  using  tho  PPDR  chocs  rtdo  at  Comp 
Woltors  o^*or  tho  post  two  yoors,  both  in  resoorch  one  in  oporotion.  No  safoty  probloms  havo  orison 
during  administration  of  tho  program,  ond  it  has  boon  *«noraIly  wsU  accoptod  by  chock  pilots.  A 
roport  on  this  progrom,  dosignotod  os  Subtask  LIFT  TV,  is  in  properration. 
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Chapter  1 


THE  FLIGHT  PROFICIENCY  EVALUATION  SYSTEM 


TNTRODUCTION 

How  reliable  and  how  analytic  is  the  traditional  flight  check  as  a 
measure  of  flight  proficiency?  This  question  has  been  of  particular 
significance  to  flight  training  administrators  charged  with  the  respon¬ 
sibility  for  continually  trying  to  improve  the  flight  check  S3'stem  used 
in  both  fixed  and  rotary  wing  flight  training  in  the  Army. 

The  flight  check  is  a  measure  of  student  progress  given  at  the 
end  of  each  phase  of  pilot  training,  administered  by  an  experienced 
check  pilot  who  acts  as  observer  and  safety  pilot.  Under  the  tradi¬ 
tional  procedure,  which  was  studied  during  early  stages  of  LIFT 
research  (1956-57),  the  student  flies  a  sample,  selected  by  the  check 
pilot,  of  the  flight  maneuvers  taught  in  the  preceding  phase.  The  check 
pilot  grades  the  student  on  the  basis  of  his  personal  evaluation  of  the 
student's  performance,  both  on  circumscribed  aspects  of  the  flight 
and  on  the  over-all  performance. 

The  nonstandardized  nature  of  the  traditional  flight  check,  in 
which  each  check  pilot  follows  personal  standards  in  grading,  has  been 
criticized  in  the  oast  as  a  source  of  unreliability  in  evaluating  flight 
performance.  The  nature  of  in-flight  proficiency  evaluation  makes  it 
impossible  to  eliminate  variations  in  the  test  due  to  different  aircraft, 
shifting  weather  conditions,  and  transient  check  pilot  or  student  moods; 
the  evaluation  process  itself  is,  of  necessity,  complex.  However,  many 
of  the  causes  of  variability— those  resulting  from  different  test  compo¬ 
nents  and  different  check  pilot  stand.ards— are  subject  to  control.  To  the 
extent  that  the  traditional  grading  system  is  unnecessarily  unreliable, 
flight  proficiency  measurement  will  be  less  valuable  as  a  means  of 
Identifying  the  weaknesses  and  strengths  of  students  and  instructors 
and  for  pinpointing  shortcomings  in  the  Program  of  Instruction  (POI). 
As  a  result,  flight  training  will  be  less  efficient  than  it  can  be  (in  terms 
of  amount  of  training  per  dollar  spent). 

The  argument  between  advocates  of  “subjectivism “  and  of  “objec¬ 
tivism"  in  flight  proficiency  measurement  has  been  going  on  in  research 
and  training  circles  for  years.  Lt.  J.M.  Brown  of  the  Roj’al  Canadian 
Air  Force  (i)  says; 

To  understand  the  origin  of  this  controv.'rsy  one  has  to  go  back  about 
twenty-five  years  in  the  abort  history  of  aviation  to  the  time  when  flying 
itself  wag  subjective.  At  that  time  the  success  of  a  pilot  depended  upon 
how  well  he  could  fly  hie  airert.ft  by  feel,  or  quite  llierally,  “by  the 
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seat  of  his  pants."  Flight  instruments  ami  radio  information  was  meager 
and,  with  the  exception  of  a  crude  compass,  navigation  aids  consisted 
largely  of  railroad  tracks  and  grain  elevators. 


Essentially  there  was  no  difference  between  civilian  and  military  flying. 
All  that  was  required  of  a  student  was  that  he  solo  the  aircraft  safely; 
after  that,  he  was  on  his  own;  and  if  he  became  lost  or  caught  away 
from  base  in  unfavorable  weather  he  simply  landed  in  the  nearest  field. 
RCAF  “Wings"  standard  at  the  time  was  reached  in  less  than  100  hours 
of  flying,  three-quarters  of  which  was  solo  practice.  With  such  a 
leisurely  program,  pilot  training  could  be  operated  quite  successfully 
by  experienced  personnel  without  an  elaborate  system  of  flying  assess¬ 
ments.  Indeed,  flying  was  an  art  and  it  was  considered  that  assessments 
of  proficiency  could  be  made  only  by  expert  pilots  on  intuitive  bases. 


Despite  the  misgivings  of  researchers  and  a  few  flight  training 
administrutors  as  to  the  reliability  of  the  traditional  evaluation  system, 
there  had  been  little  change  in  military  flight  training  evaluation  over  the 
years.  Substitute  measures  developed  through  research  were  difficult, 
and  sometimes  unsafe,  to  administer.  There  has  also  been  the  usual 
human  “resistance  to  change”;  in  fact,  many  flight  training  personnel 
have  not  viewed  the  shortcomings  of  the  system  a.s  being  serious  enough 
to  indicate  real  need  for  change. 

The  study  described  in  this  report  is,  in  effect,  a  continuation  of 
earlier  flight  proficiency  measurement  work.*  It  has  been  c.nrried 
out  in  the  Army  Aviation  training  conteset  with  the  aim  of  answering 
these  question.s" 

(1)  Kow  reliable  is  the  traditional  flight  check  system? 

(2)  Can  standardized,  objective,  practicable  measures  of 
flight  proficiency  be  developed  that  will  increase  both  the 
reliability  and  the  general  diagnostic  capacity  of  flight 
training  evaluation" 

THE  TRADITIONAL  FLIGHT  CHECK  SYSTEM 
The  Flight  Check 

The  flight  check  is  a  test  of  student  progress  given  at  the  end  of 
each  phase  of  training.  Under  the  traditional  system,  the  student  is 
required  to  fly  a  sample,  dr  perhaps  all,  of  the  flight  maneuvers  he 
has  been  taught  in  the  preceding  phase. 

The  check  pilot  usually  records  his  judgments  of  the  student’s 
performance  on  a  check  grade  slip  after  the  check  ride  is  completed. 
Generally,  check  pilots  do  not  take  notes  during  the  flight.  Examples 
of  two  check  grade  slips,  representing  two  levels  of  evaluation 
specificity,  are  presented  in  Figures  1  emd  2.  After  each  maneuver, 
maneuver  part,  or  specific  aspect  of  flight  performance  listed  on  the 
check  grade  slip,  the  check  pilot  records  a  grade  wliich  represents  his 
judgment  of  the  student’s  performance.  Finally,  an  over-all  judgment 

’Moch  of  the  research  that  ,jrovi()e(]  the  starting  point  for  this  study  is  summarized  in 
Appendix  A. 
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Figuia  1 

of  the  check  ride  is  recorded.  This  grade  is  computed  as  an  average 
of  the  grades  for  individual  maneuvers  or  is  determined  subjectively, 
without  computatio.ns,  by  the  check  pilot.  The  check  pilot  usually 
explains  low  grades  on  the  back  of  the  jheck  grade  slip. 

Check  pilots  may  belong  to  a  special  check  section.  In  some  units, 
to  become  a  member  of  the  check  section  one  must  be  a  highly  experi¬ 
enced  instructor,  exceptionally  competent  and  familiar  with  the  training 
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program;  in  other  units,  requirements  are  not  stipulated.  In  many 
flight  training  organizations  there  is  no  formal  check  section;  check 
pilots  may  simply  be  instructors  who  happen  to  be  available  when  a 
check  ride  is  due.  In  a  few  training  programs  all  but  final  check  rides 
are  administered  by  the  instructor  and  are  not,  in  the  strictest  sense, 
formal  check  rides. 

The  Training  Grade 

Students  are  graded  by  their  instructors  on  their  daily  performance 
throughout  training.  These  daily  flight  training  grades  and  the  instruc¬ 
tor’s  written  comments  are  recorded  on  grade  slips  which  are  identical 
in  format  with  the  check  grade  slips  in  Figures  1  and  2. 

Training  grades  would  be  expected  to  be  relatively  trustworthy 
because  they  are  based  on  many  observations  by  the  instructor  of  the 
student’s  performance.  However,  when  the  same  instructor  administers 
to  a  student  all  phases  of  training  in  a  training  stage,  a  substantial 
amount  of  “halo  effect"  can  result.  That  is,  a  grade  given  after  a  train¬ 
ing  ride  in  the  latter  part  of  the  training  phase  is  likely  to  be  influenced 
not  only  by  present  but  also  by  past  performance  as  remembered  by 
the  instructor  or  as  reflected  in  past  training  grades .  Thus,  a  reliable 
check  ride,  administered  by  an  expert  evaluator  (other  than  the  instruc¬ 
tor),  applying  a  uniform  set  of  standards,  is  needed  as  a  final  independent 
judgment  of  the  student’s  proficiency. 

Reliability  of  the  Traditional  Flight  Check  System 

There  is  a  reasonable  basis  for  the  view  that  students  generally 
may  be  classified  as  good,  average,  or  poor  throughout  training— from 
stage  to  stage  and,  even  more  clearly,  within  a  stage  (i.c.,  from  train¬ 
ing  to  test  grades  in  a  given  level  of  training).  Perfect  consistency  in 
the  individual  student’s  performance  is  not  to  be  expected  since  various 
stages  of  training  require  different  kinds,  as  well  as  levels,  of  skill 
from  the  pilot.  However,  certain  perceptual,  psychomotor,  and  mental 
skills  are  basic  to  all  flying,  whether  it  is  primary,  instrument,  or 
tactical.  It  the  evaluation  system  is  adequate  there  should  be  an  appre¬ 
ciable  relationship  between  a  student’s  training  and  check  grades  at 
dlifferent  levels  of  training.  Such  relationships  would  not  be  evident  if 
unreliable  measures  were  used. 

Twelve  years  of  flight  training  research,  condxKted  primarily  on 
Air  Force  pilot  training,  and  summarized  by  Ericksen  (7)  and  Ben- 
Avi  (l),  indicate  that  the  correlations  between  check  grades  at  the 
completion  of  training  and  earlier  check  or  training  grades  are  rarely 
greater  than  .30.* 

'Stadie*  condacleii  bjr  the  Air  Force  eod  other  r.eearch  penonnel  ere  briefly  eummarixed 
ia  Appeadix  A. 
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THE  FLIGHT  CHECK  SYSTEM  USED 
IN  ARMY  HELICOPTER  TRAINING 

Reliability  of  the  Army  Flight  Check  System 

To  make  preliminary  tests  of  check  system  reliability,  the 
interrelationships  anr>ong  flight  training  grades  and  check  grades  in 
primary  helicopter  training  were  analyzed  by  research  personnel  of 
the  U.S.  Army  Aviation  Human  Research  Unit  in  1956  and  1957  at  the 
U.S.  Army  Aviation  School  at  Fort  Rucker,  Ala.  At  that  time  primary 
helicopter  training  was  accomplished  in  three  phases;  Pre-Solo,  Inter¬ 
mediate,  and  Advanced.  A  phase  check  ride  was  given  at  the  end  of 
each  phase. 

The  interrelationships  among  the  training  grades  and  check  grades 
for  a  hundred  students  are  presented  in  Table  1.  Training  grades  were 

TobU  ) 

Corr*  lotions 

for  Primary  Rotary  Wing  Flight  Training  Gradot 
and  Chock  Gradot  at  Fort  Ruckor,  1956-1957' 

fN  =  IOOI 


Tmiaiag  Grade 

Cfc»ck  Grade 

Pre-^olo 

lateraediAte 

AdvSDued 

Pre-Solo 

iDtemedUie 

Advanced 
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obtained  by  averaging  the  daily  dual  ride  grades  for  each  student  for 
each  training  phase.  The  check  grades  were  those  recorded  in  the 
grade  books.  The  relationships  between  the  training  grades  and  the 
corresponding  check  grades  were  .35,  .08,  and  .09  at  the  Pre-Solo, 
Intermediate,  and  Advanced  stages  respectively.  Thus,  the  average 
training-test  relationship  was  little  better  than  zero.*  In  fact,  the 
average  of  all  interrelationships  between  training  and  check  grades  in 
Table  1  is  of  the  same  order  of  magnitude. 

‘Tbroagboot  this  report,  a  value  ia  cuuaidered  to  be  reliable  (reflecting  a  true  value)  if  it 
ia  aignificunt  at  the  .05  level  of  confidence  or  leas.  For  evsmple,  if  the  true  cocrelution  were 
r.ero,  .'.a  obtained  correlatiou  as  large  or  larger  than  one  maibed  aignificunt  would  be  expected  to 
occur  live  or  fewer  timea  ip  a  hundred.  However,  a  correlation  of  small  magnitude,  even  though 
reliable,  is  not  generall)'  naefol  for  prediction. 
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A  correlation  analysis  was  made  of  the  grades  of  55  students  at 
the  U.S.  Army  Primary  Helicopter  School  (tfSAPHS)  at  Camp  Wolters, 
Tex.  in  1957.*  The  relationships  among  training  grades  were  generally 
about  the  same  as  those  for  the  U.S.  Army  Avistion  School;  relationships 
between  training  grades  and  check  grades  were  slightly  higher,*’  * 

These  analyses  indicated  that  results  from  the  proficiency  check  sys¬ 
tem  used  in  Army  flight  evaluation  were  little  more  consistent  in  eval¬ 
uating  student  performance  than  was  the  traditional  system  in  previous 
flight  training  programs.  If  the  assumption  of  reasonable  consistency  in 
individual  student  performance  is  correct,  there  should  be  an  apprecia¬ 
ble  relationship  between  a  student's  training  and  check  grades.  Since  this 
was  not  the  case,  an  examination  of  the  system,  with  emphasis  on  check 
pilot  standards,  is  in  order  to  determine  the  causes  of  low  reliability. 

Variation  in  Army  Check  Pilots’  Evaluation  Standards 

To  determine  the  extent  of  variability  in  the  evaluation  standards 
of  check  pilots  in  Army  aviation  flight  training,  grades  were  analyzed 
for  rides  administered  by  the  check  section  in  the  Department  of 
Rotary  Wing  Training  at  Fort  Rucker  during  1656  and  1957.  Ten  Inter¬ 
mediate  check  scores  were  selected  at  random  from  those  given  by 
each  of  eight  check  pilots,  and  10  Advanced  check  scores  were  selected 
for  each  of  eight  other  check  pilots.  In  Table  2,  the  mean  check  grade 
and  the  range  of  the  means  of  the  check  pilots  in  each  group  are  pre¬ 
sented,  as  well  as  the  mean  variabilHy  and  the  range  of  variability. 


Tat>U  3 

Maant  and  langat  of  Rotary  Wing  Chock  Oradot  Oivon 
by  Chock  Riiott,  Fort  Ruckor,  1956-1957* 


Tralaiag  PktM 

Haaa^ 

Raaga 

fatermadiata 

Mean  Cbeek  Grade 

74.3* 

70.6  to  79.0 

Standard  Deviatioa 

5.2* 

2.2  to  7.3 

Advoaead 

ll«aa  Cktck  Gf«U 

74.9 

72.2  to  79.4 

Standard  Deviation 

6.5* 

3.7  to  9.9 

*r«*  tn  NftMml*4  la  aack  akark  pilot  Maa;  algki  chock  pilota  ara  lapraaaaleil  la  aach 

aalaa  (Itraa  la  ika  tabla.  Ckackpllota  are  aol  ika  aaaa  la  tka  aaalyaaa  ol  laianaediata  rad  Adaaaccd  gradea. 

kTka  apakal  *  ladicataa  al^lkaaca  at  tka  .09  laral  of  coafidaaco.  AMlyala  of  aarlaaca  waa  oaad  to 
laat  dinaraacaa  aaoag  aacaa.  Tka  L  taai  waa  oaad  to  taal  diffccaocea  aaoag  ataadard  daaiaiioaa  (aaa 
Pkkaar  0.  Jokaaoa.5iaifac<caf  Mtthoi*  Im  Saa;jnk,  l’raaiica>Ball,lBC.,N»  Yotk.  1949,  pp.  9106). 


'Aa  tka  laKar  mda  ajrataai  waa  aiad  at  Caap  WoUara  (AA,  above  average;  A,  average; 

BA,  below  average;  u,  aaaatiafactorp).  tba  Paaraoo  prodait-notoeat  r  waa  compntei!  by  aaaigning 
aaeceaaiva  iatagera  to  latter  paJaa. 

*Tba  ralatioaabipa  batweaa  trniaiag  patJea  aad  cberk  gradea  are  abown  in  Table  9,  p.  25. 

’Aaaiyaia  of  data  bora  100  alndenta  in  tbe  ArTn>  a  fixed  wing  training  propam  in  1957-58 
obowed  iaterrelationabipa  onoeg  tbe  training  ipadea  and  check  gradea  of  abont  tbe  aome  magni- 
tada  aa  tba  rotary  wing  inteiralationabipa.  (Tbeae  data,  bom  tbe  fixed  wing  training  prograraa  at 
Coaip  Cary,  Tex.  and  Fort  Rnckcr,  Ala.,  ora  pcaaeated  in  Appendix  B.) 
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The  check  pilots  differed  considerably  in  the  mean  ratings  they 
assigned  students  (some  check  pilots  seldom  fail  a  student  and  some 
fail  about  half  of  their  students).  The  differences  are  statistically 
reliable  for  the  Intermediate  check  scores  (i.e.,  they  were  larger  than 
would  result  from  chance  differences  in  the  proficiency  of  students 
assigned  to  these  pilots),  but  not  the  Advanced  check  scores. 

Since  students  were  not  assigned  to  check  pilots  on  the  basis  of 
prior  student  performance,  student  assignment  is  considered  to  have 
been  random  and  the  results  are  interpreted  tc  reflect  differences  in 
individual  check  pilot  standards.  There  is  a  tendency  for  pilots  whose 
average  ratings  are  high  to  vary  less  in  their  ratings— that  is,  they  l  ate 
within  a  narrow  range.  This  indicates  that  “easier"  check  pilots  seem  to 
be  less  willing  or  less  able  to  discriminate  among  student  performances. 

THE  FUGHT  PROFICIENCY  EVALUATION  PROCESS 

There  are  several  ways  in  which  individual  check  pilot  standards 
can  be  introduced  into  the  evaluation  process: 

(l)  Flight  Performance  Sample.  A  check  ride  should  be  based 
on  a  standardi2ed  sample  of  the  student's  performance  in  all  the  critical 
maneuvers  taught  in  the  preceding  phase.  However,  under  the  traditional 
flight  check  system,  the  student  flight  performance  actually  sampled 
on  a  check  ride  is  determined  to  a  large  extent  by  the  individual  check 
pilot.  Each  check  pilot  tends  to  have  his  own  set  of  “favorite"  maneu¬ 
vers  which  he  believes  best  shows  a  student’s  capability.  Then,  too, 
such  factors  as  weather  conditions,  availability  of  a  particular  stage 
field,  and  shortage  of  time  may  further  influence  the  check  pilot’s 
decision  as  to  what  he  will  include  in  the  flight  performance  sample  on 
a  particular  check  ride.  The  check  pilot  may  require  the  student  to 
repeats  maneuver  whenhe  performs  it  poorly  on  the  first  attempt,  thus 
reducing  the  variety  of  maneuvers  sampled  in  the  time  available  for 
the  check  ride. 

To  the  extent  that  variations  do  occur  in  the  flight  perform¬ 
ance  sample  from  one  check  ride  to  the  next,  different  students  are 
faced  with  different  “tests”  that  usually  vary  in  degree  of  difficulty. 
This  is  particularly  true  when  one  student  is  required  to  repeat  a 
difficult  maneuver  several  times  and  another  student  is  not.  The  first 
student  has  a  greater  opportunity  to  err  than  the  second  studenc;  conse¬ 
quently,  he  will  probably  present  a  poorer  over-all  picture  of  his  per¬ 
formance,  but  not  necessarily  because  he  is  less  proficient  over-all. 

Thus,  the  test  situation  is  not  uniform.  Nevertheless,  the 
grade  for  a  check  ride  is  considered  to  reflect  uniformly  the  level  of 
competence  of  students  at  a  particular  level  of  training,  whether  the 
check  ride  consists  of  all  the  phase  maneuvers  or  a  selection  of  more 
difficult  or  easier  ones. 

Often,  if  a  student  performs  dangerously  on  his  first  or 
second  maneuver,  or  perhaps  halfway  through  the  check,  the  check 
pilot  terminates  the  ride  and  gives  the  student  a  failing  grade.  Thus, 


he  relinquishes  the  opportunity  to  analyze  the  student’s  difficulties  in 
all  maneuvers  in  which  the  student  has  been  trained  in  the  preceding 
phase.  When  this  is  the  case,  subsequent  additional  flight  time  used 
for  the  purpose  of  correcting  the  student’s  deficiencies  is  less  likely 
to  be  well  directed. 

Variation  intest  content,  from  one  check  to  another,  violates 
what  is  probably  the  most  fundamental  principle  of  sound  evaluation; 
The  sample  of  knowledge,  performance,  or  behavior  which  is  to  be 
measured  must  be  uniform.  Every  deviation  from  the  rule,  'Every 
student  must  be  faced  with  the  same  set  of  requirements,  under  the 
same  conditions,”  leads  to  unreliable  evaluation. 

(2)  Observation  and  Perception.  There  are  many  aspects  of  a 
student’s  flight  performance  toward  which  the  check  pilot  might  direct 
his  attention,  such  as  attitude,  altitude,  directional  control,  and  power 
control.  Because  he  cannot  observe  all  these  things  at  once,  the  check 
pilot  must  settle  for  only  a  few  observations  at  any  one  time,  jf’rom 
those  which  he  chooses  to  view  at  a  particular  time,  his  perceptual 
process  may  eliminate  more. 

For  example,  at  a  certain  point  tho  check  pilot  may  choose 
to  look  at  the  instrument  panel.  What  he  actually  sees  on  the  instrument 
panel,  however,  might  be  the  air  speed  indicator  to  the  exclusion  of  the 
altimeter,  tachometer,  and  needle  ball;  thus  he  would  notice  only  cer¬ 
tain  air  speed  deviations  out  of  all  the  many  elements  he  might  have 
observed  at  that  moment.  Undoubtedly,  in  such  an  instance  check  pilot 
bias  may  play  a  significant  role.  Since  the  check  pilot  can  not  see  every¬ 
thing,  he  looks  at  what  he  thinks  is  most  significant. 

This  problem  can  be  reduced  by  objectively  determining 
the  important  indices  of  flight  performance,  and  from  these  selecting 
and  standardizing  the  items  that  can,  practically,  be  observed.* 

(3)  Memory.  A  check  pilot  must  observe  many  details  during 
a  check  ride.  Unless  he  records  descriptions  or  evaluations  of  perform¬ 
ance  at  the  time  it  is  observed  or  very  shortly  thereafter,  memory  will 
become  a  factor  in  determining  the  final  grade.  Indeed,  if  he  completes 
more  than  two  check  flights  before  recording  his  judgments  on  either, 
he  is  apt  to  forget  on  which  ride  a  particular  event  occurred.  Check 
pilots  with  good  memories  will  probably  base  the  grade  on  more  com¬ 
plete  details  of  the  student’s  performance  than  will  check  pilots  with 
short  memories.  More  critical,  probably,  is  the  problem  of  selective 
recall;  the  check  pilot  may  remember  what  was  most  dramatic  or  most 
important  from  his  ow'n  point  of  view,  which  may  differ  from  what 
another  check  pilot  recalls.  Thus,  selective  bias  in  observing  perform¬ 
ance  may  be  compounded  by  brns  in  recall  of  what  was  observed.  To 
the  extent  possible,  standardized  observations  should  be  recorded  dur¬ 
ing  or  Immediately  after  the  actui  1  student  performance. 


'A  mcchod  ireqdeatly  in  research  for  pinpointing  intcrobnerver  differences  is  to  hove 
two  ubserve.’s  evsluste  the  same  perfcmionce  at  the  same  time.  HnmRHO  researchers'  unsuccess¬ 
ful  attempts  to  do  this  are  sununarized  in  Appendix  C. 


(4)  itelative  Importance  of  Maneuvers.  Because  a  single  grade 
must  result  from  a  check  ride,  a  weighting  method  is  implicit  in  the 
grading  process.  For  example,  in  helicopter  flying,  a  well-executed 
forced  landing  will  usually  be  considered  more  important  than  a  well- 
executed  normal  take-off.  Unfortunately,  there  is  less  than  perfect 
agreement  even  among  experienced  flight  instructors  as  to  the  relative 
importance  of  each  training  maneuver. 

For  example,  12  experienced  check  pilots  comprising  the 
entire  check  section  of  the  U.S.  Army  Primary  Helicopter  School 
were  asked  to  judge  the  contribution  that  each  of  12  Intermediate  and 
7  Advanced  maneuvers  should  make  to  a  total  flight  evaluation  score.’ 
The  means  and  ranges  of  the  values  assigned  'o  Intermediate  maneu¬ 
vers  are  shown  in  Table  3,  and  those  for  Advanced  maneuvers  are 
presented  in  Table  4.  Although  the  individual  check  pilots  generally 
agreed  on  the  order  of  importance  of  maneuvers,  there  was  substantial 
variation  in  ranges  of  values  given  each  maneuver.  For  example,  one 
check  pilot  judged  all  Intermediate  maneuvers  as  equally  important 
while  another  check  pilot  assigned  values  to  the  most  important  maneu¬ 
vers  that  were  20  times  larger  than  those  he  allotted  the  least  important 
maneuvers.  Such  variation  among  check  pilots  in  itself  can  lead  to 
marked  scoring  variability. 
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E*timatet  of  the  Parcentege  Contribution 
12  Intermediate  Maneuver!  Should  Make  to  Total  Score 


Mftoeaver 

Ptrcenuge  Ccotribotion 

Meiesvef 

Pereentage  Coatributioo 

Mesa 

Hatge 

Mjaa 

Range 

Normal  Take-Offa 

7 

3  to  9 

Basic  Aatorotatioaa 

12 

9to20 

Nce^nal  Approacbea 

9 

3  to  13 

180^  Autoroutione 

11 

5to20 

Muiaiani  Performance 

Traffic  Paltena 

6 

3  to  13 

Takeoffs 

8 

3  to  12 

Hovering 

7 

3  to  15 

Steep  Approacbe* 

9 

3  to  12 

Hovenng 

Rnoaing  Take-Offs 

6 

1  to  9 

AntorotatioDe 

6 

3  to  10 

Ruming  Landing 

6 

1  to  10 

Forced  Landings 

13 

8to20 

(5)  The  Pilot's  Expectations  of  Student  Performance.  In  his 
final  evaluation,  the  check  pilot  must  make  a  judgment  about  the  level 
of  performance  that  can  reasonably  be  expected  from  a  student  for  a 
given  phase  of  training.  In  the  Air  Force  training  programs  in  the  late 
1940‘8  and  early  1950’s,  such  judgments  were  found  to  be  substantially 


‘Eacb  check  pilot  bed  a  misinraai  of  four  gears'  experience  in  flight  training  and  evaloatioo 
pngrama.  Six  of  the  pilota  bad  at  leaal  two  yeara’ experi-  .ce  aa  civilian  anperviaoro  and  check 
piiota  in  the  Array  Helicopter  Flight  Tcaiaiog  program.  The  six  military  check  pilota  had  from  four 
to  eight  yeara*  experience  in  helicopter  flight  training.  All  12  check  pilota  had  attended  the  same 
ataadardixation  program  for  priroaiy  helicopter  iLatmclora  and  had  worked  together  for  two  yeara. 
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Ettifflotat  of  the  Porcantogo  ConIribuHon 
7  Intarmadiota  Monauvar*  Should  Motca  to  Total  Scora* 


Mueaver 

— 1  . - 

Perc«otAg« 

ContribatioB 

llaMBver 

Percentage  Contribotion 

Meu 

j  Haas- 

Mean 

Haase 

Tale-Offa 

16 

5  to  25 

Slope  Operationa 

7 

3  to  10 

Approaches 

17 

14  to  20 

Hoveriog 

Flanoing  Irems 

23 

IS  to  40 

Aatorutationa 

7 

2  to  12 

Aircraft  Cosirol 

Forced  Landinga 

21 

10  to  30 

ao<i  Pattersa 

11 

5  to  20 

^0a«  of  tk«  12  USAPUS  check  pilMio  did  oot  rate  the  Advaaced  naseaverfi. 


affected  by  ihe  proficiency  of  the  students  whom  the  instructors  used 
as  a  basis  of  comparison.  In  1953,  Boyle  and  Hagin  (2)  demonstrated 
in  a  primary  pilot  training  urogram  that  70  per  cent  of  the  students 
with  nc  previous  flying  experience  passed  when  they  were  grouped 
together  under  the  same  instructors,  and  only  49  per  cent  of  these 
students  passed  when  they  were  considered  with  students  who  had  had 
prior  light  plane  training. 

In  1957,  Krumbolti  and  Christal  (13)  reported  data  that 
demonstrated  the  variation  among  Air  Force  instructors  in  the  level  of 
proficiency  they  expected  of  their  students.  The  study  analyzed  the 
grades  for  a  sample  of  216  Air  Force  aviation  cadets  from  one  primary 
training  base  during  a  six-year  period.  It  revealed  that  a  cadet  had  a 
better  chance  of  success  if  he  was  grouped  with  cadets  of  relatively 
lower  aptitude.  This  was  true  within  several  aptitude  levels.* 

To  summarize,  the  variation  in  check  pilots*  standards  can  be 
manifested  in  a  number  of  ways  in  the  flight  performance  evaluation 
process.  These  standards  can  influence  the  selection  of  the  flight  per¬ 
formance  sample,  the  direction  of  attention  during  the  flight  to  certain 
aspects  of  performance,  perceptual  selection  from  the  information  to 
which  attention  is  directed,  what  is  remembered  about  the  performance 
at  the  time  of  recording,  the  relative  importance  given  to  the  various 
maneuvers,  and  expectations  of  what  stud  ‘.it  performance  should  be. 

In  a  process  as  complex  and  as  important  as  the  check  flight,  it  is 
mandatory  that  the  check  pilot’s  standards  for  the  evaluation  process 
be  as  uniform  as  possible. 

For  these  reasons,  work  was  initiated  on  the  development  of  a 
flight  check  system  designed  to  reduce  variations  in  check  pilot  stand¬ 
ards,  standardize  the  sample  of  flight  performance  on  which  scoring 
is  based,  and  reduce  the  effects  of  the  check  pilots’  observation  and 
memory  bias  on  over-all  score  reliability. 

’Aptitode  was  meacared  with  the  pilot  ataaine  predictora  aaed  by  the  Air  Force  for  jelection 
of  air  eadeta. 
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HumRRO  RESEARCH 
ON  FLIGHT  PROFICIENCY  EVALUATION 

DEVELOPMENT  OF  THE  PILOT  PERFORMANCE 
DESCRIPTION  RECORDS  (PPDR's) 

As  part  of  Subtask  LIFT  II,  development  of  standardized,  relatively 
objective  measures  of  primary  and  basic  light  helicopter  pilot  profi¬ 
ciency  was  undertaken.  Initial  guidelines  for  the  format  of  the  measures 
were  provided  by  Air  Force  research  (l8,  19). 

An  analysis  of  the  training  program,  including  study  of  grade  books 
and  interviews  with  instructors,  was  the  basis  of  determining  the  areas 
of  flight  training  in  which  students  have  the  most  difficulty  (l7).  Each 
primary  and  basic  training  maneuver  was  analyzed  into  its  components. 
For  each  component,  simplified  scales  were  developed  on  which  the 
check  pilot  could  quickly  record  his  observations  or  judgments  as  the 
student  performed.  For  some  components  (such  as  pedal  usage,  approach 
path  to  confined  areas,  and  ground  track  on  downwind  legs)  on  which  the 
check  pilot  had  to  make  more  complex  judgment  s,  rating  scales  of  a 
more  subjective  type  had  to  be  developed.  In  such  items  the  points  on 
the  scales  were  defined  as  precisely  as  possible  to  minimize  personal 
interpretation  by  the  check  pilot.  Where  possible,  however,  scales 
were  developed  on  which  the  check  pilot  could  immediately  record  direct 
observations  on  instruments  or  outside  cues  (such  as  RPM,  air  speed, 
altitude,  and  approach  termination  points). 

The  original  list  of  item  components  for  each  maneuver  was  thor¬ 
oughly  tested  in  simulated  check  rides  by  LIFT  II  research  personnel. 

It  was  found  that  the  number  of  items  was  more  than  a  check  pilot  could 
safely  evaluate  in  the  allotted  tijue.  Therefore,  experienced  flight 
training  personnel  were  asked  to  select  only  the  items  which  would 
adequately  describe  the  most  critical  components  from  each  maneuver 
segment.  In  subsequent  tryouts  a  descriptive  record  of  student  perform¬ 
ance  was  produced  which  could  be  administered  safely  and  accurately 
by  a  trained  check  pitot. 

The  measures— the  Pilot  Performance  Description  Records 
(PPDR’s)— were  based  on  standard  rides;  that  is,  the  same  number  of 
maneuvers  were  to  be  flown  in  the  same  sequence  on  each  ride.  The 
check  pilots  were  instructed  to  immediately  record  their  observations 
or  judgments  of  each  maneuver  component,  except  for  those  maneuvers 
in  which  safety  considerations  dictated  against  this  procedure.  For 


autorotations,  the  latter  half  of  approaches,  the  initial  phase  of  take¬ 
offs,  and  forced  landings,  recording  was  »-ci^tnoned  until  completion  of 
the  maneuver. 

The  Pilot  Performance  Description  Records  were  administered  on 
a  trial  basis  in  1957,  and  in  revised  form  in  1958.  The  1958  versions 
of  the  Intermediate  and  Advanced  PPDR's  are  described  in  the  Manual 
of  Instruction  for  use  of  the  PPDR’s  (lO). 

TRYOUT  OF  THE  1957  VERSION  OF  THE  PPDR 
Procedure 

In  1957,  the  PPDR  was  used  in  check  rides  administered  to  75  stu¬ 
dents  (40  Intermediate;  35  Advanced)  at  the  U.S.  Primary  Helicopter 
School  at  Camp  Wolters.  Examples  of  maneuver  record  sheets  from 
the  1957  versions  of  the  Intermediate  and  Advanced  PPDR’s*  are 
presented  in  Figures  3  and  4. 

Each  student  flew  two  check  rides,  each  with  a  different  check 
pilot.  ’The  student’s  first  ride  was  flown  by  one  of  the  two  check  pilots 
on  the  LIFT  staff,  and  the  second  by  one  of  four  USAPHS  military 
check  pilots.  This  procedure  made  it  possible  to  estimate  the  agree¬ 
ment  (ride/ ride  relationship)  between  repeated  evaluations  of  the  same 
students.  The  student  did  no  flying  between  these  check  rides. 

The  assignment  of  students  to  the  LIFT  check  pilots  was  on  a 
random  or  “chance”  basis.  For  the  second  ride,  each  military  check 
pilot  was  alternately  assigned  a  student  checked  by  the  first  LIFT  check 
pilot  and  one  checked  by  the  second  LIFT  check  pilot.  The  initial  ran¬ 
dom  assignment  of  students  to  the  two  LIFT  check  pilots  ensured  that 
there  was  no  selective  bias  throughout  the  checking  procedure. 

The  LIFT  check  pilots  were  intimately  familiar  with  the  system, 
having  been  part  of  the  team  responsible  for  its  development.  The 
military  check  pilots  had  received  only  a  brief  training  program  from 
the  LIFT  research  staff  pilots.  This  training  consisted  of  (l)  approxi¬ 
mately  four  hours  of  lectures,  during  which  the  rationale  of  the  system 
was  presented  and  each  type  of  scale  was  described  and  interpreted; 

(2)  in-flight  demonstration  by  the  LIFT  staff  check  pilots  of  the  record¬ 
ing  system,  including  safety  training  (e.g.,  the  check  pilot  is  to  stop 
recording  during  certain  maneuvers  or  parts  of  maneuvers  for  reasons 
of  safety);  (3)  a  complete  check  ride  with  a  LIFT  check  pilot  acting  as 
the  student  and  the  military  pilot  recording  the  flight;  and  (4)  at  least 
one  practice  ride  with  a  student  pilot. 


'Reader*  hniliar  witli  tlie  Camp  Woller*  treiniog  proyam  will  note  that  “lotermedinte*  ia 
aabatitated  for  “Primarr*  and  “Advanced*  for  “Baaic*  to  redace  coofnaion  that  might  reanit  from 
referring  to  the  primary  phase  of  primary  training.  “Intermediate*  and  “Advanced*  had  previously 
been  need  to  refer  to  the  same  training  phases  at  Fort  Roclerahen  primary  helicopter  training 
was  coadocted  there. 
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NORMAL  APPROACH 


Sample  Record  Sheet  From  1957  Version  of  the  Advanced  (Basic) 
Pilot  Performance  Description  Record 

CONnNED  AREA  OPERATION 


CHECK  PILOT;  SalacI  •  mam  tm  mMimk  m  I*  VM»ISI«;  p*iHt  it  M«t  t»  iK*  ttuStut. 

I.  lils^  Reesaaalsuaon 

K  Potiatn  Qown  with  raspact  to; 

1.  Wind  and  focord  landing  cnas_  U  Pmt  Adq  EUsI 


2.  OfaMO/atlor/ongl*  o<  light 


B.  Alreroft  Control: 


1. 

Alrcpaad  (40-50  K) 

—  -  Erat 

Law 

HIdi 

Propar 

2. 

RPW  (3050-3150) 

—  Erat 

Law 

Propar 

(Q  z 

-  Erat 

Law 

High 

Propar 

4. 

Pwtnta 

yi 

Poor  [a. 

Prapar 

II.  Approach  oad  Low  RaooooalMoooe 

A.  Down  to  bamar. 

t.  Lina  at  diiaant  ________ 

Z  Approach  ongla  __________  ......  Shallow 

fBl  Z  Rata  of  cloauro  _________  Erai  Slaw 


4.  Padoli 


5 

Staop 

Fait 


Pfipir 

Prtear 

Propar 


B.  Ovor  barrlar  to  ground: 
1.  Claatcnca  of  honlar 


Tio  Claia 


IQ 


2.  Lina  of  dneont  _______ 

Z  Rota  of  doauca _ 

4,  Altltudr/t«nilnaiaa  ■oprooeh 
Z  RPM  (3050-3130) _ 

6.  UnnacaiKiy  Iiovaring_^__ 

7.  Padals _ 


Eiat  Slaw 


.  Law 

Erot  Law 

Yai  ..... 


TaoHI^ 

S 

Fait 

High 

HIsfi 


Propar 


Propar 

Propar 

No 

Propar 


CaaOaod  Aioa 


Scoring  of  the  PPDR*8 


Four  types  of  scores  were  used  in  scoring  the  check  rides:  total 
error  score,  item-weighted  score,  error  pattern— weighted  score,  and 
traditional  score.  The  scoies  are  defined  as  follows: 

The  total  error  score— the  number  of  item  errors  recorded 
on  the  PPDR. 

The  item-weighted  score— the  sum  of  item  errors  weighted  by 
item  importance  and  difficulty,  converted  to  a  percentage  of  the  total 
possible  score.  Item  weights  are  the  average  of  values  (ranging  from 
1  to  5)  assigned  by  experienced  check  pilots  judging  the  difficulty  and 
importance  of  each  item. 

The  error  pattern-weighted  score— the  sum  of  check  pilot 
ratings  on  maneuver  segments,  weighted  by  maneuver  importance  and 
converted  to  a  scale  ranging  from  0  to  100.  Check  pilots  rate  the  per- 
forn\ance  on  each  maneuver  segment  on  a  scale  ranging  from  0  to  10 
(O,  dangerous:  1-2,  unsatisfactory;  3-5,  below  average;  6-8,  average; 
9-10,  above  average).  Maneuver  weights  reflecting  the  difficulty  and 
importance  of  each  maneuver  are  the  average  of  values  assigned  by 
experienced  check  pilots. 

A  traditional  score— an  over-all  score  for  the  check  ride  in 
terms  of  a  letter  grade  (AA,  above  average;  A,  average;  BA,  below 
average;  U,  unsatisfactory).  Check  pilots  were  asked  to  assign  this 
score  on  the  basis  of  their  own  judgments  (i.e.,  not  to  take  the  PPDR’s 
into  account). 

The  HumRRO  check  pilots  snored  all  of  the  Intermediate  check 
rides  to  obtain  the  error  pattern-weighted  scores. 

Results  of  the  1957  PPDR  Tryout 

The  ride/ride  relationships  for  the  PPDR  check  rides  administered 
In  the  1957  tryout  are  presented  in  Table  5.  These  data  indicate  an 
increase  in  reliability  over  the  traditional  system,  particularly  in  the 
item-weighted  and  error  pattern-weighted  scores. 


Tabu  5 

CerrelaHent  Between  Bidet,  1957* 

(Comp  Welton) 


Seor* 

iniermedlaie  PPDR 
(N.40i 

Advonced  PPDR 

(Nets) 

PPDR  Score 

Item-Weighted 

.42* 

.37* 

Error  Pnttem-Weighted* 

.51* 

— 

Total  Error 

.17 

.28* 

Treditional  Grade 

.22* 

.10 

*rW  •rinl>ol*lsdiv  iiM  At  t^e  .05  level  of  confidence. 

*Tie  error  pelterm-erelpLted  ecore:-  w-re  oiiteiaed  only  lot  tbe  intemedlete  PPDR’e. 


It 


The  diagnostic  capacity  of  the  1957  PPDR'a  was  clearly  demon¬ 
strated.  The  PPDR’s  made  it  possible  to  count  not  only  total  errors 
but  also  errors  on  specific  elements  inthe  peifonriance  (such  as  pedals, 
RPM,  air  speed,  altitude,  and  ground  track). 

An  analysis  was  made  of  the  errors  recorded  by  the  check  pilots 
on  selected  PPDR  scales  (those  accounting  for  over  half  of  the  PPDR 
items)  and  two  over-all  scores.  The  difference  between  the  LIFT  staff 
pilots  (who  were  thoroughly  familiar  with  the  PPDR)  was  subtracted 
from  the  average  difference  between  the  USAPHS  pilots  (who  were  only 
oriented  in  the  use  of  the  PPDR)  to  obtain  a  "similarity  index"  for  each 
PPDR  item  and  score  listed  in  Table  6.  The  scoring  similarity  of  the 
LIFT  pilots  was  reliably  greater  for  the  items  "pedals"  and  "RPM" 
and  for  the  traditional  grades.  The  difference  between  the  two  groups 
on  the  remaining  items  analyzed  was  negligible. 
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Mean  Percentage  of  Errori  Recorded  for  Selected  Itemi 
and  Mean  Over-All  Secret  for  the  Intermediote  PPDR't,  1957 


lltB 

Ckeek  Pibts 

Siailerity 

Uidex* 

PPDR 

Cipens 

PPDRdJtMUad 

1  } 

□J 

□I 

I] 

Nwnber  of  Rides 

20 

20 

10 

10 

10 

10 

Meto  Pereeotage  of  All  Errora 

Poaaibte  for': 

Pedala 

21 

19 

s 

29 

17 

20 

11" 

HPM 

97 

37 

10 

36 

36 

17 

16* 

Air  Sp«e<) 

38 

47 

21 

33 

38 

25 

1 

Altitvde 

26 

22 

19 

9 

14 

8 

2 

Croand  Track 

IS 

13 

5 

24 

15 

11 

2 

ld««n  PPDR  It«m-Wei^ted  Score 

75 

74 

85 

72 

76 

83 

7 

TraditioDsl  Grade* 

40 

30 

100 

0 

30 

60 

45* 

"Tb*  I  ust  was  wad  to  dotcraiiw  the  olgajficosco  of  tb«  osout  of  difforaaco  betwoea  the  two  ^oapo 
of  cbock  piloto.  Tbo  oae-ulI«d  twtww  wed  fortbe  aillkypotbwlotkol  dto  check  pMoto  wbo  wore  eaperto 
la  the  PPDR  oyoieffi  were  oe  oiailior  or  leee  olatiler  to  eock  ether  w  weta  the  check  piloto  who  were  oatr 
oriwtod  to  the  PPDR  cyotea.  The  opahel  *  iadicatw  a  differeaco  that  la  algalficaat  at  the  .05  level 
of  coafldoBco. 

tThwa  aealea  coaatltated  over  half  of  the  Iteaa  oa  the  PPDR. 

*Bawd  ea  the  porceataga  of  ^average*  aad  *ahove  average”  grodea  glvea.  The  PPDR  waa  aot 
referred  to  la  aoalgaJagtke  treditioaal  ^adea  la  1957. 


The  analysis  of  the  Advanced  PPDR  data  In  Table  7  indicates  there 
is  no  systematic  difference  between  the  similarity  of  PPDR  experts 
and  th.at  of  the  PPDR-oriented  check  pilots.  The  only  significant  differ¬ 
ence  between  the  two  groups  is  for  the  traditional  score,  on  which  the 
PPDR  experts  were  reliably  more  alike.  For  the  item-weighted  score 
and  for  "air  speed,"  there  is  no  difference  in  the  amount  of  similarity 


of  the  evaluations  made  by  the  two  groups  of  check  pilots.  On  the 
remaining  items,  the  PPDR-oriented  check  pilots  were  more  similar 
than  the  PPDR  experts,  but  not  to  a  degree  that  is  statistically  reliable. 
Since  the  LIFT  pilots  had  extensive  experience  with  the  Intermediate 
PPDR  only  (their  work  with  the  Advanced  PPDR  was  in  the  early  stages), 
their  familiarity  with  the  Advanced  PPDR  was  little  greater  than  that 
of  the  PPDR-oriented  check  pilots.  Over-all,  the  value  of  experience 
in  the  use  of  the  PPDR  Is  sti-ongly  indicated. 

Table  7 


Mean  Percentage  ef  Error*  Recorded  for  Selected  Item* 
and  Mean  Over-All  Score*  for  the  Advanced  PPOR’*,  1957 


r 

Check  Piloie 

Il«a 

PPDR 

Eapefta 

PPDR.O>l«iU!d 

Siailarlty 

ladex* 

n 

o 

a 

a 

D 

LU 

Nofflbcr  of  Ride* 

IS 

20 

10 

10 

6 

9 

Mean  Pcrcestoge  of  Alt  Cmn 
PoaatUo  for^: 

Ptdola 

9 

18 

u 

5 

7 

10 

-  4 

RPM 

22 

36 

9 

12 

IS 

8 

-10 

Air  Speed 

36 

31 

21 

18 

23 

27 

0 

Altitide 

22 

29 

)9 

12 

15 

15 

-  4 

CtcIic  Coadol 

35 

22 

21 

10 

IS 

15 

-  8 

All  Iieon 

31 

26 

24 

24 

24 

30 

-  2 

Meta  PPDR  Iten-Weigbled  Score 

80 

83 

88 

85 

85 

82 

0 

Traditioaol  Grade  < 

47 

.50 

80 

30 

SO 

44 

23* 

t  test  WM  »o  d^teriDiae  tbe  aifnifiruce  af  tba  aaioaat  of  differenca  batwoea  tba  t,.o 
^oapa  of  ebock  pflota.  Tba  oaa*lalIad  taat  worn  eaed  for  the  aall  hjrpothaaia  that  the  check  pilota  who 
were  eipena  la  the  PPDR  afatra  were  aa  aUallar  or  leaa  aicjlar  to  each  other  aa  were  the  check  pilota 
who  were  oalp  orleated  to  the  PPDR  ayaten.  The  aynhol  *  iadicataa  a  differeace  that  ia  aigafficaat  at  the 
.05  larel  of  coafideace. 

^Theae  ecalee  coaatiteted  ovarhalf  of  the  itema  oa  the  PPDR. 

<Saaad  ee  the  perceaiage  ol'ararape''  aad  'ahore  aiterage*  padea  girea.  The  PPDR  waa  aot 
referred  to  ia  oaatgaiag  the  traditioaal  gradea  ia  1957. 


TRYOUT  OF  THE  1958  VERSION  OF  THE  PPDR 
Procedure 

The  PPDR's  were  modified  on  the  basis  of  practical  experience  and 
data  obtained  during  tiie  1957  experimental  administration.  Revisions 
of  the  Intermediate  PPDR  were  relatively  minor,  consisting  largely  of 
changes  in  format  making  it  easier  for  the  check  pilot  to  determine  quickly 
where  to  record  his  observations.  A  iev,  items  which  had  been  shown 
to  serve  no  purpose  were  eliminated,  and  others  were  added  where  it 
had  been  found  that  student  perfor  nance  was  not  described  sufficiently. 


Modifications  of  the  Advanced  PPDR  were  substantial.  The  type 
of  specific  scale  used  in  the  Intermediate  PPDR  items  was  substituted 
for  the  more  categorical  type  of  scale  which  had  been  used  in  the  1957 
Advanced  PPDR.  Many  ineffective  scales  were  eliminated  on  the  basis 
of  experience,  and  a  set  of  maneuvers  requiring  take-offs  and  approaches 
over  a  tree,  both  into  the  wind  and  crosswind,  was  added. 

Examples  of  the  format  used  for  the  1958  version  of  the  Intermediate 
and  Advanced  PPDR’s  are  presented  in  Figures. 5  and  6.  Both  PPDR’s 
are  described  in  detail  in  the  Manual  of  Instruction  for  the  use  of 
the  PPDR’p  (10). 

In  the  1958  experimental  tryout,  12  check  pilots  were  trained  to 
administer  the  PPDR.  Six  of  the  check  pilots  were  civilian  flight 
commanders  or  other  responsible  training  administrators  with  the 
Southern  Airways  civilian  contract  school  at  the  USAPHS,  Camp  Wolters, 
and  six  were  military  pilots  who  were  part  of  the  monitoring  military 
check  section  at  the  USAPHS. 

The  training  program  was  administered  to  the  12  check  pilots  by 
the  two  LIFT  pilots  who  had  participated  in  the  1957  evaluation.  The 
1958  training  was  somewhat  more  comprehensive  than  that  given  in 
1957.  It  lasted  one  week  and  consisted  of  (l)  a  three-hour  detailed 
presentation  and  discussion  of  the  individual  scales  in  the  Intermediate 
and  Advanced  checks;  (2)  two  hours  of  in-flight  orientation  in  the  use 
of  the  PPDR's,  conducted  by  the  two  LIFT  staff  pilots;  (3)  practice 
with  at  least  one  student;  (4)  a  final  "evaluation*  ride  with  a  wIFT 
pilot  simulating  student  performance;  (5)  a  procedure  for  identifying 
markedly  different  individual  check  pilot  standards,  and  partially 
modifying  these  standards  (requiring  approximately  five  hours  of 
classroom  work  for  each  check  pilot).* 

Following  the  check  pilot  training  program,  two  successive  check 
rides  were  administered  to  each  of  50  Intermediate  and  50  Advanced 
student  pilots  to  obtain  estimates  of  check  pilot  agreement  (reliability). 
The  first  ride  was  always  administered  by  one  of  the  civilian  pilots, 
and  the  second  by  one  of  the  military  pilots. 

The  four  scores  computed  for  each  PPDR  check  r.tde  were  essen¬ 
tially  as  described  for  the  1957  tryout.  However,  in  1958  each  check 
pilot  (military  and  civilian)  scored  the  PPDR  Immediately  after  com¬ 
pletion  of  the  check  ride  and  provided  the  error  pattern-weighted 
score.  Also,  the  check  pilots  were  required  to  base  the  "traditional" 
score  on  a  careful  review  of  the  PPDR  results. 

Results  of  the  1958  PPDR  Tryout 

Ride/ride  relationships  for  the  1958  PPDR  tryout  are  presented 
in  Table  8,  The  error  pattern— weighted  and  the  traditional  scores  are 
the  most  reliable.  The  traditional  score  for  the  PPDR  tryout  in  1958 
is  reliably  higher  than  that  for  1957. 


‘Ilia  procedve  la  daacribed  io  more  detail  ia  Chapter  3,  pp,  28*29. 
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These  results  provide  evidence  that  subjective  judgments  can,  under 
controlled  conditions,  serve  to  provide  a  reliable  general  estimate  of 
student  proficiency. 

TabU  8 

CorMlationt  Ketween  Rides,  1958* 

(Camp  Woltari) 


htcmediale  PPDR 


.Uvaaced  PPDR 


Item-Veightad  .17  .42* 

Frror  Pattara— Weighted  .42*  .52* 

Total  Error  .14  .37* 

Traditional  Grade  (PPDR-baaed)  .48*  .47* 

*rha  aaaibar  of  aladaata  la  50  la  all  caaea  aacapt  for  tha  latanaadiata  arrer  pattani— 
waightad  ocore.  whara  It  la  48.  Tha  oynbol  *  iadicatao  oigaificaaca  at  tha  .05  lavei 
of  ooafidaaca. 

It  was  Stated  earlier  that  a  flight  proficiency  evaluation  system 
should  reflect  to  a  substantial  degree  the  consistency  presumed  to 
exist  in  student  flight  performance  from  early  to  later  training.  Data 
presented  in  Table  1  and  in  Appendix  B  indicate  that  interrelationships 
between  the  Army's  traditional  check  scores  and  training  grades  are 
low.  By  comparison,  the  .'•elationships  between  the  1908  PPDR  scores 
given  by  the  military  check  pilots  (on  the  second  of  two  rides  by  each 
student)  and  the  training  grades  given  in  the  Pre  -Solo,  Intermediate, 
and  Advanced  phases  of  training  are  substantially  higher.  To  facilitate 
comparison.  Table  9  presents  the  relationships  of  training  grades  with 
check  scores  given  at  Camp  Wolters  in  1957  and  with  those  given  during 
the  PPDR  administration  in  1958. 

For  the  Advanced  training  phase,  the  PPDR  error  pattern-weighted 
scores  and  traditional  scores  (based  on  the  PPDR  check  ride)  should 
show  a  relatively  high  relationship  to  training  grades.  This  is  partic¬ 
ularly  significant  because  the  Advanced  phase  of  training  includes  the 
maneuvers  that  are  most  similar  to  those  a  helicopter  pilot  would  have 
to  perform  in  tactical  flying.  The  relationships  between  traditional 
grades  (PPDR-based)  and  training  grades  are  also  high  for  the  Inter¬ 
mediate  training  phase.  However,  the  relatively  low  relationships  of 
the  Pre-Solo  scores  and  the  Intermediate  PPDR  scores  to  training 
grades  may  suggest  that  there  is  less  consistency  in  student  performance 
in  the  early  phases  of  training  as  compared  with  the  later  phase. 
Miller  (y )  has  suggested  that  the  crucial  source  of  unreliability  of 
check  rides  is  the  lack  of  consistency  in  pilot  performance  from  day  to 
day.  The  relatively  high  reliability  of  results  for  the  Advanced  PPDR 
may  suggest  that  this  conclusion  is  approoriate  only  for  the  early 
stages  of  training.  The  somewhat  higher  relationships  between  Inter¬ 
mediate  check  scores  and  Advanced  training  grades  further  support 
this  interpretation. 
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Correlations  Botwoon  Training  Grades  and  Check  Grades  (1957') 
and  Training  Grades  and  PPDR  Scores  (1958)' 

(Comp  Woltors) 


Ch«ek  Grade  or  Score 

Trainiag  Grade 

Pre  Solo 

latemediate 

AJ««uuntl  . 

!9S7 

(NxSS) 

Traditional  Check  Grade 

Pre>Solo 

.*5* 

.44* 

.10 

latemediate 

.14 

.14 

.45* 

Advanced 

.21 

.14 

.10 

I9S8 

(NxSO) 

Pre-Solo 

.58* 

.41* 

.41* 

latennediate  PPDR  Score 

Total  Error 

.14 

.14 

.28* 

Item-Wei|d!ted 

.20 

.22 

.33* 

Error  Pattern— Weighted 

.22 

.14 

.33* 

Traditional  (PPDR-Haaed) 

.3S» 

.42* 

.37* 

Advanced  PPDR  Score 

Total  Elrror 

.SO* 

.SS* 

.42* 

Itcm-Weighteii 

.48* 

.ST* 

.44* 

Error  Pattern— Weighted 

.52* 

.65* 

.53* 

Tn<litieii«l  (PPDP.-Uud) 

.55* 

.60* 

.51* 

*rb«  syiiSol  *  iaSiealM  sigaiflcaM*  at  .OS  Itvcl  of  coofideoco. 


Since  relationships  between  check  scores  and  training  grades, 
particularly  for  the  Advanced  PPDR,  are  of  a  magnitude  that  requires 
relatively  high  reliability  of  measurement,  it  appears  that  the  PPDR 
evaluation  system  is  basically  sound. 

The  marked  improvement  in  the  reliability  of  the  concurrent  tra¬ 
ditional  score  may  be  attributable  to  (l)  the  diagnostic  data  obtained 
with  the  PPDR,  (2)  the  necessity  to  review  the  PPDR’s  to  determine 
over-all  scoree,  and  (s)  the  PPDR  check  pilot  training  program.  How¬ 
ever,  an  additional  factor  must  be  considered:  Both  the  civilian 
(Southern  Airways)  training  and  military  check  section  personnel  at 
Camp  Wolters  were  devoting  every  effort  between  the  1957  and  1958 
tests  to  improvement  of  the  training  and  monitoring  system.  Undoubt¬ 
edly  these  efforts  resulted  in  increased  standardization  as  well  as 
improved  training.  This  is  suggested  by  the  somewhat  higher  relation¬ 
ships  in  1958  than  in  1957  between  training  grades  and  pre-solo  check 
grades  (which  were  not  based  on  the  PPDR  system).  Unfortunately, 
data  reflecting  relationships  between  training  grades  and  the  tradi¬ 
tional  check  grades  were  not  analyzed  just  before  the  1958  tryout.  This 
would  have  provided  a  more  complete  control  for  the  comparison  of 
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training  grades  with  check  grades  and  of  the  PPDR  with  the  traditional 
check  system.  Thus,  the  data  presented  do  not  constitute  proof,  but 
are  only  substantiating  evidence  that  the  PPDR  system  was  more 
reliable  than  the  traditional  check  system  in  1958. 

Substantial  improvement  in  reliability  is  indicated,  however,  by  the 
higher  interrelationships  between  training  grades  in  1958  (ranging  from 
.61  to  .74)  as  compared  with  those  for  1957  (ranging  from  .10  to  .45). 
The  sizable  increase  in  training  grade  interrelationships  in  the  1958 
PPDR  tryout  suggests  that  the  evaluation  of  training  rides  as  well  as 
check  rides  had  become  more  standardized. 

However,  even  more  standardization  is  necessary  since  there  v/as 
still  considerable  variation  among  check  pilots  in  the  PPDR’s  adminis¬ 
tered  in  1958.  The  percentages  of  all  possiole  errors  that  were  scored 
by  the  12  check  pilots  on  selected  items  and  for  all  items  of  the  PPDR’s, 
as  well  as  PPDR-derived  scores,  were  computed  and  the  means  and 
standard  deviations  are  presented  in  Appendix  Table  D-l.  Although 
the  one-week  training  program  given  for  the  1958  tryout  appears  to 
have  made  check  pilo;  standards  more  uniform  than  in  1957,  it  did  not 
eliminate  check  pilot  differences.  It  is  noteworthy,  however,  that  a 
major  contribution  of  the  PPDR’s  is  the  extent  to  which  they  allow 
specification  of  so.Tie  of  variation  in  check  pilot  standards. 

The  PPDR  system  itself  is  substantially  more  diagnostic  and  more 
reliable  than  the  traditional  system.  However,  either  a  more  intensive 
check  pilot  training  program  or  a  check  pilot  selection  program,  or 
more  likely  both,  must  be  initiated  if  the  remaining  substanvial  effects 
of  check  pilot  biases  on  flight  proficiency  evaluation  reliability  are  lo 
be  further  reduced. 


Chaptar  3 

APPLICATION  OF  THE  PPOJt  SYSTEM 

CHARACTERISTICS  OF  THE  PPDR  SYSTEM 

The  prototype  flight  check  evaluation  system  developed  in  this  study 
consists  of  (l)  Intermediate  and  Advanced  PPDR  booklets  on  which  per¬ 
sonnel  serving  as  check  pilots  can  score  specific  maneuvers  on  stand¬ 
ardized,  relatively  objective  scales;  (2)  a  training  program  to  familiarize 
the  check  pilot  with  the  concepts  and  techniques  involved  in  the  PPDR 
system  and  to  give  him  practice  in  administering  check  rides  using  the 
PPDR;  (3)  classroom  training  for  check  pilots  in  scoring  standard 
PPDR's— that  is,  an  identical  set  of  PPDR’s  for  actual  check  rides— to 
allow  identification  of  specific  areas  in  which  the  check  pilots' standards 
of  evaluation  are  atypical;  (4)  methods  of  scoring  the  PPDR,  the  most 
promising  of  which  is  the  error  pattern -weighted  score  which  reflects 
both  the  importance  of  each  maneuver,  in  the  judgment  of  expert  opinion, 
and  the  check  pilot's  evaluation  of  over-all  performance  of  each  maneuver. 

The  PP.OR  system  requires  that  the  sameflight  test  situation  bo  pre¬ 
sented  to  each  student  pilot.  The  type  and  number  and,  insofar  as  possi¬ 
ble,  the  sequence  of  maneuvers  included  in  a  fligi.t  check  is  rigorously 
standardized.  This  fulfills  the  fundamental  principle  of  sound  evaluation 
that  all  students  be  exposed  to  conditions  which  are  as  nearly  identical 
as  possible.  The  existence  of  variables  that  cannot  be  controlled,  such 
as  weather  and  differences  in  flight  characteristics  of  aircraft,  makes 
it  even  more  essential  that  controllable  factors  be  standardized. 

The  PPDR  provides  a  detailed  and  permanent  record  of  the  student's 
performance  on  a  flight  sample  of  critical  maneuvers.  The  record  can 
be  analyzed  in  detail  to  diagnose  student  performance  or  to  compare 
check  pilot  observations  with  those  of  other  check  pilots.  The  flight 
performance  sample  utilized  in  the  PPDR  system  is  realistic;  it  has 
been  selected  on  the  basis  of  a  complete  analysis  of  training  maneuvers, 
tactical  flying  requirements,  and  expert  pilot  opinion.  Most  crucial 
maneuvers  are  included  in  the  PPDR  check  ride. 

SCORING  STANDARD  PPDR's  AS  PART 
OF  THE  1958  TRAINING  PROGRAM 

The  requirement  that  check  pilots  -se  similar  standards  in  record¬ 
ing  their  observations  and  in  scoring  the  data  tliat  they  have  recorded 
cannot  be  overemphasized.  One  method  of  determining  whether  check 
pilots  are  using  similar  standaras  is  to  have  them  evaluate  the  same 
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performance  and  then  compare  their  evaluations.*  This  was  attempted 
as  part  of  the  LIFT  study. 

During  the  one  week  of  training  that  check  pilots  received  in  1958, 
each  of  the  12  check  pilots  was  presented  with  an  identical  set  of 
10  completed  PPDR  descriptions  of  actual  student  flight  performances. 
The  PPDR’s  were  selected  from  those  administered  as  part  of  the 
1957  evaluation  program  and  represented  a  wide  range  of  student  per¬ 
formance.  The  cover  sheets  on  which  final  scores  and  iriformation 
about  the  student  had  been  recorded  were  removed  so  that  the  pilots 
would  not  have  any  initial  bias. 

The  check  pilots  assigned  a  rating  from  0  to  10  (O,  dangerous; 
1-2,  unsatisfactory;  3-5,  below  average;  6-8,  average;  9-10,  above 
average)  to  each  of  more  than  100  maneuvers  and  maneuver  components 
for  each  PPDR.  These  ratings  were  multiplied  by  maneuver  v/eights 
which  had  been  determined  by  a  group  of  expert  check  pilots  on  the 
basis  of  difficulty  and  criticality  of  each  maneuver.  A  total  score  was 
then  determined  by  summing  the  weighted  ratings.* 

it  should  be  noted  that  in  scoring  these  10  PPDR’s  the  12  pilots 
were  required  to  evaluate  only  recorded  descriptions  of  the  flight  per¬ 
formance.  No  actual  flight  checking  was  involved. 

For  the  12  check  pilots,  a  score  for  each  of  the  10  standard  PPDR’s 
was  obtained.  Correlations  between  pairs  of  check  pilots’  evaluations 
ranged  from  .82  to  .99,  indicating  considerable  agreement  in  scoring 
between  the  check  pilots.  However,  the  differences  between  check 
pilots,  even  within  this  limited  range,  appear  meaningful  in  terms  of 
agreement  in  scoring  actual  flight  checks. 

CLASSROOM  SCORING  AGREEMENT 
AND  RIDE/RIDE  RELA'HONSHIPS 

The  relatively  simple  classroom  technique  described  above  shows 
promise  as  a  method  for  quickly  pinpointing  differences  in  check  pilot 
standards  that  would  produce  differing  results  in  actual  flight  checks. 
Following  the  administration  of  the  PPDR’s  in  1958,  it  was  possible  to 
select  pairs  of  check  pilots  who  had  checked  the  same  students  and  to 
compare  the  agreement  of  their  standards  in  the  classroom  see  ring 
with  the  agreement  of  the  scores  given  by  them  to  the  same  students 
during  a  flight  check. 

Table  10  shows  the  relationships  of  PPDR  flight  check  scores 
for  check  pilot  pairs  whose  classroom  scoring  agreements  were  from 
.82-. 99  (all  pairs),  .91-. 99,  aiid  .95-. 99.  It  is  clear  from  Table  10 
that  more  agreement  in  the  classroom  scoring  does  mean  considerably 
more  agreement  in  actual  flight  check  evaluation  for  the  Intermediate 

'Auempt*  were  also  mada  to  praaeat  the  aaaa  parformaace  to  check  pilota  by  flying 
two  obaarvera  at  the  aane  line  and  by  preaantiag  atndani  performance  on  film,  bnt  these  efforts 
were  not  aaccesafnl.  The  atndiea  oa  inlerobaerver  relationships  are  presented  in  Appendix  C. 

*rhis  proeedore  is  used  to  obtain  the  error  patten— weighted  score  (nee  p.  18).  Means  and 
ranges  of  manenrer  weights  are  given  in  Tables  8  and  4. 
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PPDR  Flight  Chock  Scoring  Agroomont  for  Chock  Pilot  Pairs 
Comporod  With  Classroom  Standard  PPDR  Scoring  Agroomont* 


PPDR  Scor« 

CorreUtiooa  of  Flight  Check  Scorea^  for  Check  Pilot  Peirs 
WboM  Scorea  oa  Staodarl  PPDR'e  Correlated: 

.82-99 

.91-.99 

.95-.99 

Istennediatc  PPDR 

lten<*1leigiited 

.17 

.22 

.67* 

Errof  Pattern— Weighted 

.42* 

.54* 

.70* 

Advuce<i  PPDR 

Iteo«Weighte<! 

.42* 

.44* 

.56* 

Error  Pattern— Weighted 

.52* 

.51* 

.61* 

aeoriag  wa«  performed  oa  the  aome  10  latennedUte  PPDR  booklets  by  all  cheek  pitots. 
hTkt  symbol  *  iadicates  agreemeoi  that  la  aigaificaDt  at  the  .OSlorel  etcc>afideaea.  AllSOatedeata 
ara  repreaealed  for  thelatezmediate  aad  Advanced  correlations  for  all  cheek  pilot  pairs.  For  check  pilot 
paira  with  agreemeat  of  over  .9!.  the  aombar  of  stodaDis  la  42  and  44  for  the  latermadiaie  aad  Advanced 
PPDR.  reapectively:  for  paira  with  agreemeat  of  over  .9S.  the  awiber  la  23  aad  33,  raapactlvely. 

PPDR.  A  trend  in  the  same  direction,  but  less  pronounced,  is  shown 
for  the  Advanced  PPDR.  However,  it  must  be  remembered  that  the 
classroom  method  for  comparing  standards  was  based  only  on  Inter¬ 
mediate  PPDR  records.  It  would  be  expected  that  the  Advanced  PPDR 
agreement  would  be  much  better  predicted  with  a  classroom  technique 
for  the  Advanced  PPDR,  which  can  easily  he  developed  and  applied. 

On  the  basis  of  these  results,  it  seems  probable  that  training  of 
check  pilots  in  scoring  stanc  \rd  PPDR’s  can  increase  uniformity  of 
standards  and  consequently  lead  to  greater  reliability  of  the  evaluation 
system.  Since  an  increase  in  reliability  is  critical  to  future  training 
methods  research  and  to  iriprovement  of  flight  training  by  training 
supervisors,  still  more  effort  should  be  directed  toward  development 
of  uniformity  of  standards  among  check  pilots. 

DEVELOPMENT  OF  THE  PPDR  SYSTEM 
FOR  OPERATIONAL  USE 

The  data  obtained  in  this  study  provide  the  basis  for  further  devel¬ 
opment  of  the  PPDR  system,  by  means  of: 

(1)  Refinement  of  the  PPDR  and  scoring  method. 

(2)  Extension  of  the  training  of  personnel  serving  as  check 
pilots.  An  aviator  assigned  to  duty  as  a  check  pilot  has  the  necessary 
flight  qualifications  and  requires  only  training  to  become  qualified  in 
the  use  of  PPDR's.  Selection  maj'  be  necesjary  where  check  standards 
are  extremely  lenient  and  cannot  be  modified. 

(3)  Establishment  of  an  information  system  which  will  provide 
feedback  on  training  results  (a)  to  students  for  determining  specific 
areas  where  extra  training  is  necessary-  (b)  to  instructors  to  inform 
them  of  specific  weaknesses  in  their  instruction,  (c)  to  command  per¬ 
sonnel  regarding  the  effectiveness  of  the  over-all  program  of  instruc¬ 
tion,  and  (d)  to  check  pilots,  showing  where, over  time,th''  'r  standards 
are  not  sufficiently  uniform. 
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Appendix  A 


PRIOR  RESEARCH  ON  THE  USE  OF  OBJECTIVE  MEASURES 
IN  FLIGHT  PERFORMANCE  EVALUATION 


At  least  one  research  effort  has,  with  some  success,  been  directed 
toward  attempting  to  improve  the  reliability  of  the  grade  resulting  from 
the  traditional  subjective  system.  Crawford  and  Da' .ey  (s)  reported  a 
technique  for  using  Air  Force  flight  instructors’  comments  written  on 
the  backs  of  grade  slips.  Greater  reliability  of  evaluation  resulted 
from  their  method  than  from  use  of  the  grade  alone.  While  the  tech¬ 
nique  may  be  cumbersome  for  regular  use  in  a  training  program,  the 
study  did  indicate  that  instructors  and  check  pilots  are  capable  of  more 
reliable  evaluation  of  student  flying  proficiency  than  they  manifest  in 
the  regular  grading  system. 

The  efforts  of  research  personnel  to  reduce  the  effects  of  differ¬ 
ences  in  check  pilot  standards  and  to  otherwise  increase  the  reliability 
and  diagnostic  capacity  of  flight  proficiency  evaluation  were  directed 
primarily  toward  making  the  evaluation  system  more  objective.  In  the 
systems  that  have  been  developed,  research  personnel  have  tried  to 
Increase  the  extent  to  which  the  check  pilot  observes  and  describes 
rather  than  evaluates  during  the  actual  check  ride.  The  larger  subjec¬ 
tive  judgn>ents  are  reduced  to  smaller  specific  judgments  (e.g.,  too 
much  left  pedal  during  the  first  take-off;  over-controlled  on  the  third 
landing)  or,  In  scoring,  a  subjective  score  is  assigned  to  each  error 
ratner  than  to  the  totalUy  of  errors.  Description  has  been  assumed  to 
be  an  essential  characteristic  of  a  diagnostic  flight  performance  evalua¬ 
tion,  and  to  be  fundamental  to  its  reliability. 

As  early  as  1939,  a  research  attempt  was  made  to  devise  a  means 
of  obtaining  more  objective  and  detailed,  as  well  as  more  reliable, 
information  from  flight  proficiency  measures.  The  resulting  Ohio 
State  Flight  Inventory  (6)  was  directed  toward  increasing  the  objectivity 
of  flight  proficiency  measures.  Other  research  efforts  along  these 
lines  prior  to  1947  are  summarized  by  Ben-Avi  (l),  and  those  before 
1952,  by  Ericksen  (7).  In  one  of  the  most  successful  studies,  reported 
by  Gordon  (9)  and  Nagay  (^),  a  system  of  evaluation  for  airline  pilot 
proficiency  was  devised  that  depended  largely  on  objective  and  detailed 
in-flight  records.  This  system  provided  a  ride/ ride  reliability  of  .70, 
one  of  the  highest  yet  reported.  The  reliability  was  based  bn  the  rela¬ 
tionship  between  two  successive  admin' '.trations  of  the  same  check 
ride  to  the  same  student  by  different  check  pilots.  Of  course,  the  air¬ 
line  pilot’s  activities  are  more  procedural  and  require  less  frequent 


and  less  gross  control  adjustments  than  do  the  lighter  aircraft  on 
which  most  flight  proficiency  research  has  been  done. 

A  well- conceived  research  effort  conducted  for  the  Navy  In 
1952  (^)  did  not  result  in  an  increase  in  the  reliability  over  that 
obtained  in  the  traditional  system.  The  objective  evaluation  method 
which  was  devised  proved  no  more  reliable  at  the  pre-solo  (ride/ride 
relationship  of  .32)  and  instrument  (ride/ride  relationship  of  .33) 
stages  than  the  traditional  subjective  method  (.42  and  .41,  respectively). 
It  is  noteworthy  that  the  reliability  of  the  traditional  method  reported 
in  the  Navy  study  was  higher  than  in  most  studies.  The  authors  attrib¬ 
ute  the  low  reliability  of  the  experimental  systent  to  day-to-day 
fluctuations  in  student  performance  rather  than  to  errors  of  measure¬ 
ment,  citing  Miller  (l4,  p.  36l)  for  support.*  “Different  check  pilots" 
is  also  listed  as  a  reason  for  this  low  reliability,  along  with  weather 
and  aircraft  differences.  Considering  other  flight  proficiency  research 
successes,  these  explanations  hardly  seem  to  be  adequate. 

It  should  be  noted  that  in  the  Navy  study  there  was  considerable 
resistance  to  the  objective  check  on  the  part  of  the  instructors.  Sixty- 
nine  per  cent  of  the  instructors  who  participated  in  the  tryout  considered 
the  in-flight  use  of  the  objective  booklets  dangerous.  This  reaction 
may  be  accounted  for  by  the  facts  that  (l)  one  of  the  checks  was  used 
at  the  pre-solo  stage*;  (2)  the  format  of  the  booklets  in  which  the  check 
pilots  recorded  their  observations  required  considerable  “head-in¬ 
cockpit*  time  to  find  out  where  to  record;  and  (3)  inadequate  training 
in  the  use  of  the  booklets  was  given  the  check  pilots.  However,  check 
pilot  aversion  to  objective  checks  has  been  encountered  to  some 
extent  in  most  studies. 

Probably  the  most  definitive  flight  evaluation  work  has  been  accom¬ 
plished  by  the  Basic  Pilot  Training  Research  Laboratory  of  the  Human 
Resources  Research  Center,  Air  Training  Command,  Goodfellow  Air 
Force  Base,  San  Angelo,  Tex.  The  work  described  in  this  report  was 
largely  based  on  the  Air  Force  precedent.  The  developmental  aspects 
of  the  Air  Force  work  are  described  by  Smith,  Flexman,  and  Houston  (^) 
and  Smith  and  Flexman  (^}.  The  objective  method  developed  was 
relatively  reliable  in  comparison  with  the  traditional  system  (most 
estimates  of  ride/ ride  relationships  averaged  above  .50),  but  the  relia¬ 
bility  varied  considerably  from  one  application  to  the  next,  ranging 
from  .17  to  .67  (l^).  However,  the  diagnostic  capability  of  this  flight 
proficiency  description  system  was  of  great  value.  Excellent  examples 
of  its  use  for  this  purpose  are  presented  by  Flexman  et  al.  (s),  Ornsteln 
et  al.  (l6),  and  Houston  (u).  In  these  reports,  detailed,  objective  infor¬ 
mation  about  specific  errors  made  by  students  at  various  stages  of 
training  was  presented  which  demonstrated  the  kind  of  valuable  analy¬ 
sis  which  is  made  possible  by  an  objective  flight  evaluation  system,  as 
compared  to  the  traditional  subjective  system. 
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In  the  various  research  efforts.  Increasing  objectivity  and  requir¬ 
ing  subjective  judgments  to  be  more  specific  have  usually  resulted  in 
higher  reliability  and  almost  always  have  produced  greater  analytic 
capacity  in  comparison  with  the  traditional  method.  But  the  increases 
in  reliability  of  check  grades  have  not  been  as  great  as  is  desired,  and 
the  fluctuating  reliability  of  the  objective  check  has  plagued  researchers. 
Apparently,  the  requirement  for  check  pilots  to  attend  to  and  describe, 
or  judge  (where  description  is  not  possible),  specific  aspects  of  student 
performance  is,  of  itself,  no  guarantee  of  high  reliability.  Check  pilot 
biases  seem  to  be  manifested  in  "relatively  objective"  measures  as 
well  as  in  subjective  measures,  and  this  probably  accounts  for  low  or 
fluctuating  reliability.  Thus,  primary  attention  should  be  accorded 
the  problem  of  reducing  differences  in  check  pilot  standards  so  that 
the  more  objective  measures  can  be  used  reliably  and  for  detailed 
diagnosis  of  training  programs. 
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Apptnciix  B 


RELATIONSHIPS  BETWEEN  CHECK  GRADES 
AND  TRAINING  GRADES 

IN  THE  ARMY’S  FIXED  WING  TRAINING  PROGRAM 


TobU  B-1 

CorralalioRt  of  Fixod  Wing  Chock  and  Training  Oradot, 
Comp  Gory,  1957-1958* 
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Cerr*latien«  of  fix»d  Wing  Ch«ek  and  Training  Grodat, 
Fort  Rutkor,  1957-1958* 
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Appendix  C 

AHEMPTS  TO  STUDY  INTEROBSERVER  RELATIONSHIPS 


Interobserver  studies  are  traditionally  an  integral  part  of  flight 
proficiency  measurement  research  (l9).  A  method  was  sought  in  this 
study  for  obtaining  interobserver  agreement  data  by  placing  two  check 
pilots  in  the  same  aircraft,  both  evaluating  the  student's  performance 
slmultanfourtiy.  Unfortunately,  neither  H-23  nor  H-13  helicopters,  for 
which  the  PPDR’s  were  being  developed,  were  capable  of  safely  carry¬ 
ing  three  people  through  several  of  the  primary  training  maneuvers, 
particularly  with  a  student  pilot.  The  H-13H,  wit),  its  more  powerful 
engine,  was  used  in  an  attempt  to  have  two  check  pilots  observe  an 
instructor  pilot  who  simulated  student  performance.  However,  the 
added  weight  substantially  altered  the  performance  of  the  aircraft  dur¬ 
ing  critical  maneuvers  such  as  autorutations,  maximum  performance 
take-offs,  and  steep  approaches.  Under  high-density  altitude  conditions 
the  performance  of  these  primary  maneuvers  with  two  passengers, 
even  by  an  expert  pilot,  approached  being  dangerous. 

In  order  to  study  interobserver  agreement,  a  helicopter  (Cessna 
YH-41)  somewhat  similar  in  size  and  general  configuration  to  the 
H-23  and  H-13  and  capable  of  carrying  a  pilot  and  three  passengers, 
was  obtained  and  attempts  were  made  to  adapt  primary  maneuvers  to 
this  aircraft.  The  flight  characteristics  of  the  YH-41  were  sufficiently 
dissimilar  to  the  H-13  and  H-23  that  quite  different  procedures  were 
required  to  execute  primary  maneuvers.  Had  the  project  been  continued, 
the  results  would  probably  have  been  applicable  only  to  the  YH-41.  The 
YH'41  was  experimental  at  that  time  and  three  successive  mechanical 
failures  'ermlnated  the  investigation.  Thus,  initial  attempts  to  obtain 
in-flight  interobserver  data  failed. 

If  the  efforts  to  obtain  interobserver  data  had  beet,  successful, 
there  would  still  have  been  the  problem  of  obtaining  a  permanent,  accu¬ 
rate,  Independent  record  of  the  actual  performance.  As  interobserver 
efforts  did  fail,  attempts  to  record  actual  student  flight  performance 
became  even  more  important,  particularly  because  of  the  need  to  allow 
for  comparison  of  actual  performance  records  with  check  pilot  records. 

Prior  research  had  successfully  used  a  series  of  photographs  of 
the  instrument  panel  to  obtain  partial  records  of  student  performance  (l9). 
HumRRO  research  personnel  attempted  to  adapt  to  the  H-i3  and  H-23 
helicopters  a  camera  arrangement  whicl.  would  photograph  the  instru¬ 
ment  panel  and  the  horizon  during  flight  at  the  same  time  that  a  check 
pilot  applies  the  experimental  PPDR’s.  This  approach  was  unsuccessful 
at  first  because  of  inadequate  knowledge  of  photographic  techniques  and 


a  shortage  of  time,  personnel,  and  money.  A  method  which  did  appear 
to  work  was  developed  too  late  to  be  included  in  the  final  data  collection 
phase  In  the  summer  of  1958. 

Had  the  photographic  methods  been  successful,  only  about  25  per 
cent  of  the  check  items  could  have  been  recorded,  and  approximately 
four  hours  per  check  ride  would  have  been  required  of  a  trained  clerk 
to  translate  filmed  information  into  useful  data.  Because  of  budget 
limitations,  this  technique  was  not  considered  for  further  study. 
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Appendix  D 

VARIATION  AMONG  CHECK  PILOTS 
IN  SCORING  THE  1958  PPOR’s 


Tabi*  D-1 


Meant  and  Standard  Deviationt 
of  Percenlaget  of  Errors  Scored  by  Check  Piloti 
on  Selected  PPOR  Items  and  of  PPOR-Derived  Scores,  1958* 
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4.6 
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13.4 

24.7 
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29.1 

6.8 

38.3 

12.5 

Altita<l« 

163 

5.4 

16.6 

5.6 

Craond  track 

16.1 

5.4 

AI!  Itama 

20.4 

6.0 

19.2 

5»«> 

PPDR-D«rive<l  Score 

ItaiB-waigblcd 

84.6 

5.1 

84.6 

5.1 

Error  patteni-«*i^t«j 

70.2 

6.5 

70.3 

6.2 

Traditional*  (PPDR-baacd) 

55.4 

14.7 

45.7 

21.0 

^  the  12  check  pilot**  on#  wo*  sot  ovaibhie  for  the  latermediol*  PPDR  aBaly*!*,  asd  acother  was 
•M  anllabla  for  the  Adraaco^  PPDR  aaalyai*.  Tbosr  H  check  pilot*  are  repreaeniiHl  in  each  O’atlotlc 

fe  thi*  table. 

^The**  iteai*  coaatUated  ever  half  of  the  item*  os  the  PPDR^a. 

*Ba*ed  oa  the  perceatage  of  **veragM*  aad  *above  average*  grade*  given. 
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